dgit.raspbian.org Git

xen: sched_credit: define and use curr_on_cpu(cpu)

To fetch `per_cpu(schedule_data,cpu).curr' in a more readable
way. It's in sched-if.h as that is where `struct schedule_data'
is declared.

Signed-off-by: Dario Faggioli <dario.faggioli@citrix.com>
Acked-by: George Dunlap <george.dunlap@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>

usbif: drop bogus definition

Just like recently done for vSCSI, remove a backend implementation
detail from the interface header.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

add maintainers entry for vendor-independent IOMMU code

As agreed to last week.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

libxenstore: filter watch events in libxenstore when we unwatch

XenStore puts in queued watch events via a thread and notifies the user.
Sometimes xs_unwatch is called before all related message is read. The use
case is non-threaded libevent, we have two event A and B:
- Event A will destroy something and call xs_unwatch;
- Event B is used to notify that a node has changed in XenStore.
As the event is called one by one, event A can be handled before event B.
So on next xs_watch_read the user could retrieve an unwatch token and
a segfault occured if the token store the pointer of the structure
(ie: "backend:0xcafe").

To avoid problem with previous application using libXenStore, this behaviour
will only be enabled if XS_UNWATCH_FILTER is given to xs_open.

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Julien Grall <julien.grall@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>

x86/kexec: Change NMI and MCE handling on kexec path

Experimentally, certain crash kernels will triple fault very early
after starting if started with NMIs disabled.  This was discovered
when experimenting with a debug keyhandler which deliberately created
a reentrant NMI, causing stack corruption.

Because of this discovered bug, and that the future changes to the NMI
handling will make the kexec path more fragile, take the time now to
bullet-proof the kexec behaviour to be safer in more circumstances.

This patch adds three new low level routines:
* nmi_crash
    This is a special NMI handler for using during a kexec crash.
* enable_nmis
    This function enables NMIs by executing an iret-to-self, to
    disengage the hardware NMI latch.
* trap_nop
    This is a no op handler which irets immediately.  It is not
    declared
    with ENTRY() to avoid the extra alignment overhead.

And adds three new IDT entry helper routines:
* _write_gate_lower
    This is a substitute for using cmpxchg16b to update a 128bit
    structure at once.  It assumes that the top 64 bits are unchanged
    (and ASSERT()s the fact) and performs a regular write on the lower
    64 bits.
* _set_gate_lower
    This is functionally equivalent to the already present
    _set_gate(), except it uses _write_gate_lower rather than updating
    both 64bit values.
* _update_gate_addr_lower
    This is designed to update an IDT entry handler only, without
    altering any other settings in the entry.  It also uses
    _write_gate_lower.

The IDT entry helpers are required because:
  * Is it unsafe to attempt a disable/update/re-enable cycle on the
    NMI or MCE IDT entries.
  * We need to be able to update NMI handlers without changing the IST
    entry.

As a result, the new behaviour of the kexec_crash path is:

nmi_shootdown_cpus() will:

* Disable the crashing cpus NMI/MCE interrupt stack tables.
    Disabling the stack tables removes race conditions which would
    lead
    to corrupt exception frames and infinite loops.  As this pcpu is
    never planning to execute a sysret back to a pv vcpu, the update
    is
    safe from a security point of view.

* Swap the NMI trap handlers.
    The crashing pcpu gets the nop handler, to prevent it getting
    stuck in
    an NMI context, causing a hang instead of crash.  The non-crashing
    pcpus all get the nmi_crash handler which is designed never to
    return.

do_nmi_crash() will:

* Save the crash notes and shut the pcpu down.
    There is now an extra per-cpu variable to prevent us from
    executing this multiple times.  In the case where we reenter
    midway through, attempt the whole operation again in preference to
    not completing it in the first place.

* Set up another NMI at the LAPIC.
    Even when the LAPIC has been disabled, the ID and command
    registers are still usable.  As a result, we can deliberately
    queue up a new NMI to re-interrupt us later if NMIs get unlatched.
    Because of the call to __stop_this_cpu(), we have to hand craft
    self_nmi() to be safe from General Protection Faults.

* Fall into infinite loop.

machine_kexec() will:

  * Swap the MCE handlers to be a nop.
     We cannot prevent MCEs from being delivered when we pass off to
     the crash kernel, and the less Xen context is being touched the
     better.

  * Explicitly enable NMIs.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Minor style changes.

Signed-off-by: Keir Fraser <keir@xen.org>
Committed-by: Keir Fraser <keir@xen.org>

x86/mm/hap: Adjust vram tracking to play nicely with log-dirty.

The previous code assumed the guest would be in one of three mutually exclusive
modes for bookkeeping dirty pages: (1) shadow, (2) hap utilizing the log dirty
bitmap to support functionality such as live migrate, (3) hap utilizing the
log dirty bitmap to track dirty vram pages.
Races arose when a guest attempted to track dirty vram while performing live
migrate.  (The dispatch table managed by paging_log_dirty_init() might change
in the middle of a log dirty or a vram tracking function.)

This change allows hap log dirty and hap vram tracking to be concurrent.
Vram tracking no longer uses the log dirty bitmap.  Instead it detects
dirty vram pages by examining their p2m type.  The log dirty bitmap is only
used by the log dirty code.  Because the two operations use different
mechanisms, they are no longer mutually exclusive.

Signed-Off-By: Robert Phillips <robert.phillips@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Minor whitespace changes to conform with coding style
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>

libxl: introduce XSM relabel on build

Allow a domain to be built under one security label and run using a
different label.  This can be used to prevent the domain builder or
control domain from having the ability to access a guest domain's memory
via map_foreign_range except during the build process where this is
required.

Example domain configuration snippet:
  seclabel='customer_1:vm_r:nomigrate_t'
  init_seclabel='customer_1:vm_r:nomigrate_t_building'

Note: this does not provide complete protection from a malicious dom0;
mappings created during the build process may persist after the relabel,
and could be used to indirectly access the guest's memory. However, if
dom0 correctly unmaps the domain upon building, a the domU is protected
against dom0 becoming malicious in the future.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

libxl: qemu trad logdirty: Tolerate ENOENT on ret path

It can happen in error conditions that lds->ret_path doesn't exist,
and libxl__xs_read_checked signals this by setting got_ret=NULL. If
this happens, fail without crashing.

Reported-by: Alex Bligh <alex@alex.org.uk>,
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: use strcmp in device_tree_type_matches

We want to match the exact string rather than the first subset.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

xen: get GIC addresses from DT

Get the address of the GIC distributor, cpu, virtual and virtual cpu
interfaces registers from device tree.

Note: I couldn't completely get rid of GIC_BASE_ADDRESS, GIC_DR_OFFSET
and friends because we are using them from mode_switch.S, that is
executed before device tree has been parsed. But at least mode_switch.S
is known to contain vexpress specific code anyway.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

vscsiif: allow larger segments-per-request values

At least certain tape devices require fixed size blocks to be operated
upon, i.e. breaking up of I/O requests is not permitted. Consequently
we need an interface extension that (leaving aside implementation
limitations) doesn't impose a limit on the number of segments that can
be associated with an individual request.

This, in turn, excludes the blkif extension FreeBSD folks implemented,
as that still imposes an upper limit (the actual I/O request still
specifies the full number of segments - as an 8-bit quantity -, and
subsequent ring slots get used to carry the excess segment
descriptors).

The alternative therefore is to allow the frontend to pre-set segment
descriptors _before_ actually issuing the I/O request. I/O will then
be done by the backend for the accumulated set of segments.

To properly associate segment preset operations with the main request,
the rqid-s between them should match (originally I had hoped to use
this to avoid producing individual responses for the pre-set
operations, but that turned out to violate the underlying shared ring
implementation).

Negotiation of the maximum number of segments a particular backend
implementation supports happens through a new "segs-per-req" xenstore
node.

Signed-off-by: Jan Beulich <jbeulich@suse.com>

VMX: intr.c: remove i386 related code

i386 arch is no longer supported by Xen, remove the related code.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Committed-by: Jan Beulich <jbeulich@suse.com>

x86/IST: Create set_ist() helper function

... to save using open-coded bitwise operations, and update all IST
manipulation sites to use the function.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Committed-by: Jan Beulich <jbeulich@suse.com>

x86/ucode: Improve error handling and container file processing on AMD

Do not report error when a patch is not appplicable to current processor,
simply skip it and move on to next patch in container file.

Process container file to the end instead of stopping at the first
applicable patch.

Log the fact that a patch has been applied at KERN_WARNING level, modify
debug messages.

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@amd.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>

x86/EFI: work around CFLAGS being passed in through environment

Short of a solution to the problem described in
http://lists.xen.org/archives/html/xen-devel/2012-12/msg00648.html,
deal with the bad effect this together with c/s 25751:02b4d5fedb7b has
on the EFI build by filtering out the problematic command line items.

Signed-off-by: Charles Arnold <carnold@suse.com>
Signed-off-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Jan Beulich <jbeulich@suse.com>

x86: frame table related improvements

- fix super page frame table setup for memory hotplug case (should
  create full table, or else the hotplug code would need to do the
  necessary table population)
- simplify super page frame table setup (can re-use frame table setup
  code)
- slightly streamline frame table setup code
- fix (tighten) a BUG_ON() and an ASSERT() condition
- fix spage <-> pdx conversion macros (they had no users so far, and
  hence no-one noticed how broken they were)

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

xen: reserve next two XENMEM_ op numbers for future/out-of-tree use

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Committed-by: Keir Fraser <keir@xen.org>

xen: centralize accounting for domain tot_pages

Provide and use a common function for all adjustments to a
domain's tot_pages counter in anticipation of future and/or
out-of-tree patches that must adjust related counters
atomically.

Signed-off-by: Dan Magenheimer <dan.magenheimer@oracle.com>
Committed-by: Keir Fraser <keir@xen.org>

streamline guest copy operations

- use the variants not validating the VA range when writing back
  structures/fields to the same space that they were previously read
  from
- when only a single field of a structure actually changed, copy back
  just that field where possible
- consolidate copying back results in a few places

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/oprofile: adjust CPU specific initialization

Drop support for 32-bit only CPU models as well as those that can be
dealt with by the arch_perfmon bits. Models 14 and 15 remain as
questionable (I'm not 100% positive that these don't support 64-bit
mode).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

scheduler: fix rate limit range checking

For one, neither of the two checks permitted for the documented value
of zero (disabling the functionality altogether).

Second, the range checking of the command line parameter was done by
the credit scheduler's initialization code, despite it being a generic
scheduler option.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

QEMU_TAG update

x86: mark certain items static

..., and at once constify the data items among them.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/HVM: add missing assert to stdvga's mmio_move()

... to match the IOREQ_READ path.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/EFI: add code interfacing with the secure boot shim

... to validate the kernel image (which is required to be in PE
format, as is e.g. the case for the Linux bzImage when built with
CONFIG_EFI_STUB).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/p2m: drop redundant macro definitions

Also, add log level indicator to P2M_ERROR(), and drop pointless
underscores from all related macros' parameter names.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86: properly fail mmuext ops when get_page_from_gfn() fails

I noticed this inconsistency while analyzing the code for XSA-32.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

nested vmx: check host ability when intercept MSR read

When guest hypervisor tries to read MSR value, we intercept this
behavior and return certain emulated values. Besides that, we also
need to ensure that those emulated values must compatible with host
ability.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: fix interrupt delivery to L2 guest

While delivering interrupt into L2 guest, L0 hypervisor need to check
whether L1 hypervisor wants to own the interrupt, if not, directly
inject the interrupt into L2 guest.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable PAUSE and RDPMC exiting for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable "Virtualize APIC accesses" feature for L1 VMM

If the "Virtualize APIC accesses" feature is enabled, we need to sync
the APIC-access address from virtual vvmcs into shadow vmcs when doing
virtual_vmentry.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable IA32E mode while do VM entry

Some VMMs may check the platform capability to judge whether long
mode guest is supported. Therefore we need to expose this bit to
guest VMM.

Xen on Xen works fine in current solution because Xen doesn't
check this capability but directly set it in VMCS if guest
supports long mode.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: fix DR access VM exit

For DR register, we use lazy restore mechanism when access
it. Therefore when receiving such VM exit, L0 should be responsible to
switch to the right DR values, then inject to L1 hypervisor.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: fix handling of RDTSC

If L0 is to handle the TSC access, then we need to update guest EIP by
calling update_guest_eip().

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: fix rflags status in virtual vmexit

As stated in SDM, all bits (except for those 1-reserved) in rflags
would be set to 0 in VM exit. Therefore we need to follow this logic
in virtual_vmexit.

Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: expose bit 55 of IA32_VMX_BASIC_MSR to guest VMM

Besides, use literal name instead of hard numbers for this bit 55 in
IA32_VMX_BASIC_MSR.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: use literal name instead of hard numbers

For those default 1 settings in VMX MSR, use some literal name
instead of hard numbers in the code.

Besides, fix the default 1 setting for pin based control MSR.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: emulate MSR bitmaps

In nested vmx virtualization for MSR bitmaps, L0 hypervisor will trap
all the VM exit from L2 guest by disable the MSR_BITMAP feature. When
handling this VM exit, L0 hypervisor judges whether L1 hypervisor uses
MSR_BITMAP feature and the corresponding bit is set to 1. If so, L0
will inject such VM exit into L1 hypervisor; otherwise, L0 will be
responsible for handling this VM exit.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Committed-by: Keir Fraser <keir@xen.org>

tools/gdbsx: fix build failure with glibc-2.17

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Committed-by: Keir Fraser <keir@xen.org>

tighten guest memory accesses

Failure should always be detected and handled.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

memop: adjust error checking in populate_physmap()

Checking that multi-page allocations are permitted is unnecessary for
PoD population operations. Instead, the (loop invariant) check added
for addressing XSA-31 can be moved here.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86/HVM: remove dead code

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

gnttab_usage_print() should be static

... as not being used or declared anywhere else.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

X86/vMCE: handle broken page with regard to migration

At the sender
  xc_domain_save has a key point: 'to query the types of all the pages
  with xc_get_pfn_type_batch'
  1) if broken page occur before the key point, migration will be fine
     since proper pfn_type and pfn number will be transferred to the
     target and then take appropriate action;
  2) if broken page occur after the key point, whole system will crash
     and no need care migration any more;

At the target
  Target will populates pages for guest. As for the case of broken page,
  we prefer to keep the type of the page for the sake of seamless migration.
  Target will set p2m as p2m_ram_broken for broken page. If guest access
  the broken page again it will kill itself as expected.

Suggested-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Liu Jinsong <jinsong.liu@intel.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

libxl: Make an internal function explicitly check existence of expected paths

libxl__device_disk_from_xs_be() was failing without error for some
missing xenstore nodes in a backend, while assuming (without checking)
that other nodes were valid, causing a crash when another internal
error wrote these nodes in the wrong place.

Make this function consistent by:
* Checking the existence of all nodes before using
* Choosing a default only when the node is not written in device_disk_add()
* Failing with log msg if any node written by device_disk_add() is not present
* Returning an error on failure
* Disposing of the structure before returning using libxl_device_disk_displose()

Also make the callers of the function pay attention to the error and
behave appropriately. In the case of libxl__append_disk_list_of_type(),
this means only incrementing *ndisks as the disk structures are
successfully initialized.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: disable interrupts on return_to_hypervisor

At the moment it is possible to reach return_to_hypervisor with
interrupts enabled (it happens all the times when we are actually going
back to hypervisor mode, when we don't take the return_to_guest path).

If that happens we risk loosing the content of ELR_hyp: if we receive an
interrupt right after restoring ELR_hyp, once we come back we'll have a
different value in ELR_hyp and the original is lost.

In order to make the return_to_hypervisor path safe, we disable
interrupts before restoring any registers.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

README: docs/pdf/user.pdf was deleted in 24563:4271634e4c86

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

gitignore: ignore xen-foreign/arm.h

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

MAINTAINERS: Reference stable maintenance policy

I also couldn't resist fixing a typo and adding a reference to
http://wiki.xen.org/wiki/Submitting_Xen_Patches for the normal case as
well.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Keir Fraser <keir@xen.org>
Committed-by: Jan Beulich <jbeulich@suse.com>

mini-os: drop shutdown variables when CONFIG_XENBUS=n

Shutdown variables are meaningless when CONFIG_XENBUS=n since no
shutdown event will ever happen. Better make sure that no code tries
to use it and never get the hoped shutdown event.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Committed-by: Keir Fraser <keir@xen.org>

MAINTAINERS: Device tree is maintained by the ARM maintainers

Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>

IOMMU/ATS: fix maximum queue depth calculation

The capabilities register field is a 5-bit value, and the 5 bits all
being zero actually means 32 entries.

Under the assumption that amd_iommu_flush_iotlb() really just tried
to correct for the miscalculation above when adding 32 to the value,
that adjustment is also being removed.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Wei Huang <wei.huang2@amd.com>

x86: get_page_from_gfn() must return NULL for invalid GFNs

... also in the non-translated case.

This is XSA-32 / CVE-2012-xxxx.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

memop: limit guest specified extent order

Allowing unbounded order values here causes almost unbounded loops
and/or partially incomplete requests, particularly in PoD code.

The added range checks in populate_physmap(), decrease_reservation(),
and the "in" one in memory_exchange() architecturally all could use
PADDR_BITS - PAGE_SHIFT, and are being artificially constrained to
MAX_ORDER.

This is XSA-31 / CVE-2012-5515.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

xen: fix error handling of guest_physmap_mark_populate_on_demand()

The only user of the "out" label bypasses a necessary unlock, thus
enabling the caller to lock up Xen.

Also, the function was never meant to be called by a guest for itself,
so rather than inspecting the code paths in depth for potential other
problems this might cause, and adjusting e.g. the non-guest printk()
in the above error path, just disallow the guest access to it.

Finally, the printk() (considering its potential of spamming the log,
the more that it's not using XENLOG_GUEST), is being converted to
P2M_DEBUG(), as debugging is what it apparently was added for in the
first place.

This is XSA-30 / CVE-2012-5514.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

xen: add missing guest address range checks to XENMEM_exchange handlers

Ever since its existence (3.0.3 iirc) the handler for this has been
using non address range checking guest memory accessors (i.e.
the ones prefixed with two underscores) without first range
checking the accessed space (via guest_handle_okay()), allowing
a guest to access and overwrite hypervisor memory.

This is XSA-29 / CVE-2012-5513.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

hvm: Limit the size of large HVM op batches

Doing large p2m updates for HVMOP_track_dirty_vram without preemption
ties up the physical processor. Integrating preemption into the p2m
updates is hard so simply limit to 1GB which is sufficient for a 15000
* 15000 * 32bpp framebuffer.

For HVMOP_modified_memory and HVMOP_set_mem_type preemptible add the
necessary machinery to handle preemption.

This is CVE-2012-5511 / XSA-27.

Signed-off-by: Tim Deegan <tim@xen.org>
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

gnttab: fix releasing of memory upon switches between versions

gnttab_unpopulate_status_frames() incompletely freed the pages
previously used as status frame in that they did not get removed from
the domain's xenpage_list, thus causing subsequent list corruption
when those pages did get allocated again for the same or another purpose.

Similarly, grant_table_create() and gnttab_grow_table() both improperly
clean up in the event of an error - pages already shared with the guest
can't be freed by just passing them to free_xenheap_page(). Fix this by
sharing the pages only after all allocations succeeded.

This is CVE-2012-5510 / XSA-26.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>

xl: Check for duplicate vncdisplay options, and return an error

If the user has set a vnc display number both in vnclisten (with
"xxxx:yy"), and with vncdisplay, throw an error.

Update man pages to match.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

xen: arm: Use $(OBJCOPY) not bare objcopy

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reported-by: Anthony PERARD <anthony.perard@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

libxl: fix wrong comment

The comment in function libxl__try_phy_backend is wrong, 1 is returned
if the backend should be handled as "phy", while 0 is returned if not.

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

docs: expand persistent grants protocol

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

mini-os: shutdown_thread depends on xenbus

This fixes the build of the xenstore stub domain, which should never
be shut down and so does not need this feature.

Signed-off-by: Daniel De Graaf <dgdegra@tycho.nsa.gov>
Committed-by: Keir Fraser <keir@xen.org>

arm: const-correctness in virt_to_maddr

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

arm: handle xenheap which isn't at the start of RAM.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

arm: create a raw binary target.

This is suitable for direct loading by a bootloader.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

arm: Enable build without CONFIG_DTB_FILE

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

AMD IOMMU: add locking missing from c/s 26198:ba90ecb0231f

An oversight of mine; I'm sorry.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>

Add gtags and tags rune in gitignore.

Signed-off-by: Wei Liu <wei.liu2@citrix.com>
Committed-by: Keir Fraser <keir@xen.org>

minios/console: console_input() weak reference

In exactly the same style as app_main() in kernel.c, create a weak
reference console_input() function for applications to override to
quickly gain access to the console.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Acked-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Committed-by: Keir Fraser <keir@xen.org>

[minios] Add xenbus shutdown control support

Add a thread watching the xenbus shutdown control path and notifies a
wait queue.

Add HYPERVISOR_shutdown convenient inline for minios shutdown.

Add proper shutdown to the minios test application.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Committed-by: Keir Fraser <keir@xen.org>

[minios] Fix test application link when pcifront is not enabled

When pcifront is not enabled, the test application needs to disable
the PCI test.

Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable load IA32_PERF_GLOBAL_CTRL feature for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable load and save IA32_EFER feature for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable load and save IA32_PAT feature for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable save VMX-preemption timer feature for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable VMX-preemption timer for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable Descriptor-table exiting for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable secondary processor-based VM-Execution controls

Enable secondary processor-based control in VMCS

Besides that, add a helper function to get the certain control bit
in secondary processor-based control MSR.

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Signed-off-by: Xiantao Zhang <xiantao.zhang@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable NMI-window exiting for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

nested vmx: enable Monitor Trap Flag for L1 VMM

Signed-off-by: Dongxiao Xu <dongxiao.xu@intel.com>
Acked-by: Jun Nakajima <jun.nakajima@intel.com>
Committed-by: Keir Fraser <keir@xen.org>

docs: fix persistent grants doc typo

Signed-off-by: Roger Pau Monné <roger.pau@citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- fix additional typo s/this grants/these grants/g ]
Committed-by: Ian Campbell <ian.campbell@citrix.com>

xen/arm: build as zImage

The zImage format is extremely simple: it only consists of a magic
number and 2 addresses in a specific position (see
http://www.simtec.co.uk/products/SWLINUX/files/booting_article.html#d0e309).

Some bootloaders expect a zImage; considering that it doesn't cost us
much to build Xen compatible with the format, make it so.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Tim Deegan <tim@xen.org>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
[ ijc -- switch from 7*nop + nop to just 8*nop ]
Committed-by: Ian Campbell <ian.campbell@citrix.com>

x86/hap: Fix memory leak of domain->arch.hvm_domain.dirty_vram

Signed-off-by: Kouya Shimura <kouya@jp.fujitsu.com>
Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>

x86/mm: Comment the definitions of _mfn(), _gfn() &c.

It's not very easy to find them if you don't know to look for the
TYPE_SAFE() macro.

Signed-off-by: Tim Deegan <tim@xen.org>
Committed-by: Tim Deegan <tim@xen.org>

VT-d: make scope parsing code type safe

Rather than requiring the scopes to be the first members of their
respective structures (so that casts can be used to switch between the
different views), properly use types and container_of().

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>
Acked-by Xiantao Zhang <xiantao.zhang@intel.com>

IOMMU: imply "verbose" from "debug"

I think that generally enabling debugging code without also enabling
verbose output is rather pointless; if someone really wants this, they
can always pass e.g. "iommu=debug,no-verbose".

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

VT-d: adjust IOMMU interrupt affinities when all CPUs are online

Since these interrupts get setup before APs get brought online, their
affinities naturally could only ever point to CPU 0 alone so far.
Adjust this to include potentially multiple CPUs in the target mask
(when running in one of the cluster modes), and take into account NUMA
information (to handle the interrupts on a CPU on the node where the
respective IOMMU is).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

AMD IOMMU: include IOMMU interrupt information in 'M' debug key output

Note that this also adds a few pieces missing from c/s
25903:5e4a00b4114c (relevant only when the PCI MSI mask bit is
supported by an IOMMU, which apparently isn't the case for existing
implementations).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

VT-d: include IOMMU interrupt information in 'M' debug key output

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

ACPI: fix return value of XEN_PM_PDC platform op

Should return -EFAULT when copying to guest memory fails.

Once touching this code, also switch to using the more relaxed copy
function (copying from the same guest memory already validated the
virtual address range).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

x86: fix hypercall continuation cancellation in XENMAPSPACE_gmfn_range compat wrapper

When no continuation was established, there must also not be an attempt
to cancel it - hypercall_cancel_continuation(), in the non-HVM, non-
multicall case, adjusts the guest mode return address in a way assuming
that an earlier call hypercall_create_continuation() took place.

Once touching this code, also restructure it slightly to improve
readability and switch to using the more relaxed copy function (copying
from the same guest memory already validated the virtual address
range).

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

README: adjust gcc version requirement

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

libxl: Fix bug in libxl_cdrom_insert, make more robust against bad xenstore data

libxl_cdrom_insert was failing to initialize the backend type,
resulting in the wrong default backend. The result was not only that
the CD was not inserted properly, but also that some improper xenstore
entries were created, causing further block commands to fail.

This patch fixes the bug by setting the disk backend type based on the
type of the existing device.

It also makes the system more robust by checking to see that it has
got a valid path before proceeding to write a partial xenstore entry.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

xl: xl.conf(5): correct advice re autoballooning vs. dom0_mem.

The advice was backwards, you should really disable autoballoon if you
use dom0_mem. Also add a reference to the command-line docs.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

README: add Pixman as build dependency

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

libxl: fix a variable underflow in libxl_wait_for_free_memory

When xl is called to create a domU and there is not enough memory available,
then the autoballooning is called to extract memory from dom0. During the
ballooning a loop in libxl_wait_for_free_memory() waits unless enough memory
is available to create the domU.

But because of a variable-underflow the loop can finish too soon and xl
finally aborts with the message:

xc: error: panic: xc_dom_boot.c:161: xc_dom_boot_mem_init: can't allocate low memory for domain: Out of memory
libxl: error: libxl_dom.c:430:libxl__build_pv: xc_dom_boot_mem_init failed: Device or resource busy
libxl: error: libxl_create.c:901:domcreate_rebuild_done: cannot (re-)build domain: -3

The variable-underflow happens when freemem_slack is larger then
info.free_pages*4, because the solution of this operation is converted
implicit to a unsigned int to match the type of memory_kb.

Add a extra check for this condition to solve the problem.

Signed-off-by: Ronny Hegewald <Ronny.Hegewald@online.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

xenstore-chmod: handle arbitrary number of perms rather than MAX_PERMS constant

Constant MAX_PERMS 16 is too small to use in some occasions, e.g. if
there are more than 16 domU(s) on one hypervisor (it's easy to
achieve) and one wants to do xenstore-chmod PATH to all domU(s). So,
remove MAX_PERMS limitation and make it as arbitrary number of perms.

Signed-off-by: Chunyan Liu <cyliu@suse.com>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>

x86/time: fix scale_delta() inline assembly

The way it was coded, it clobbered %rdx without telling the compiler.
This generally didn't cause any problems except when there are two back
to back invocations (as in plt_overflow()), as in that case the
compiler may validly assume that it can re-use for the second instance
the value loaded into %rdx before the first one.

Once at it, also properly relax the second operand of "mul" (there's no
need for it to be in %rdx, or a register at all), and switch away from
using explicit register names in the instruction operands.

Signed-off-by: Jan Beulich <jbeulich@suse.com>
Acked-by: Keir Fraser <keir@xen.org>

xend: enable environment passing in xPopen3

In changeset 19990:38dd208e1d95 a new parameter 'env' was added to
xPopen3, but no code was added to actually pass the environment down to
execvpe. Also, the new code was unreachable.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Campbell <ian.campbell@citrix.com>